Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
نویسندگان
چکیده
Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of "true" functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.
منابع مشابه
Comparison of Parametric and Non-parametric EEG Feature Extraction Methods in Detection of Pediatric Migraine without Aura
Background: Migraine headache without aura is the most common type of migraine especially among pediatric patients. It has always been a great challenge of migraine diagnosis using quantitative electroencephalography measurements through feature classification. It has been proven that different feature extraction and classification methods vary in terms of performance regarding detection and di...
متن کاملBrief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience.
Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of the...
متن کاملApplication of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams
Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important...
متن کاملNon-parametric Machine Learning Methods for Clustering and Variable Selection
Qian Liu: Non-parametric machine learning methods for clustering and variable selection (Under the direction of Eric Bair) Non-parametric machine learning methods have been popular and widely used in many scientific research areas, especially when dealing with high-dimension low sample size (HDLSS) data. In particular, clustering and biclustering approaches can serve as exploratory analysis too...
متن کاملThe Impact of Different Genetic Architectures on Accuracy of Genomic Selection Using Three Bayesian Methods
Genome-wide evaluation uses the associations of a large number of single nucleotide polymorphism (SNP) markers across the whole genome and then combines the statistical methods with genomic data to predict the genetic values. Genomic predictions relieson linkage disequilibrium (LD) between genetic markers and quantitative trait loci (QTL) in a population. Methods that use all markers simultaneo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2016